337 research outputs found

    Detecting patterns of species diversification in the presence of both rate shifts and mass extinctions

    Get PDF
    Recent methodological advances are enabling better examination of speciation and extinction processes and patterns. A major open question is the origin of large discrepancies in species number between groups of the same age. Existing frameworks to model this diversity either focus on changes between lineages, neglecting global effects such as mass extinctions, or focus on changes over time which would affect all lineages. Yet it seems probable that both lineages differences and mass extinctions affect the same groups. Here we used simulations to test the performance of two widely used methods, under complex scenarios. We report good performances, although with a tendency to over-predict events when increasing the complexity of the scenario. Overall, we find that lineage shifts are better detected than mass extinctions. This work has significance for assessing the methods currently used for estimating changes in diversification using phylogenies and developing new tests.Comment: 34 pages, 11 figure

    Evolutionary footprint of coevolving positions in genes

    Get PDF
    Motivation: The analysis of molecular coevolution provides information on the potential functional and structural implication of positions along DNA sequences, and several methods are available to identify coevolving positions using probabilistic or combinatorial approaches. The specific nucleotide or amino acid profile associated with the coevolution process is, however, not estimated, but only known profiles, such as the Watson-Crick constraint, are usually considered a priori in current measures of coevolution. Results: Here, we propose a new probabilistic model, Coev, to identify coevolving positions and their associated profile in DNA sequences while incorporating the underlying phylogenetic relationships. The process of coevolution is modeled by a 16 × 16 instantaneous rate matrix that includes rates of transition as well as a profile of coevolution. We used simulated, empirical and illustrative data to evaluate our model and to compare it with a model of ‘independent' evolution using Akaike Information Criterion. We showed that the Coev model is able to discriminate between coevolving and non-coevolving positions and provides better specificity and specificity than other available approaches. We further demonstrate that the identification of the profile of coevolution can shed new light on the process of dependent substitution during lineage evolution. Availability: http://www2.unil.ch/phylo/bioinformatics/coev Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Comparative Performance of Supertree Algorithms in Large Data Sets Using the Soapberry Family (Sapindaceae) as a Case Study

    Get PDF
    For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing powe

    Molecular characteristics of the CupB chaperone-usher pathway and the Tps4 two-partner secretion system in Pseudomonas aeruginosa

    Get PDF
    The opportunistic human pathogen Pseudomonas aeruginosa is a threat for immunocompromised individuals, a major cause of nosocomial infections, and is prevalent in patients with cystic fibrosis. The bacterium can form biofilms that help it evade the immune response. It adheres to host cells using molecular adhesins, such as pili assembled by chaperone-usher pathways (Cup). Understanding the adhesion could, therefore, help develop treatments that prevent the establishment of infections. This thesis considers the CupB system, consisting of an usher (CupB3), two chaperones (CupB2 and CupB4) and two pilin subunits (CupB1 and CupB6). The chaperones target the pilin subunits to the usher assembling a CupB1-containing pilus with a putative CupB6 adhesin at its tip. The cupB operon also encodes the TpsA-like protein (two partner secretion) CupB5, previously suggested to be secreted by CupB3. The aim of this work was to understand the CupB1-containing pilus assembly and CupB5 secretion mechanism. Genetic and biochemical techniques were used, such as deletion or point mutation, qRT-PCR, pull-down assays, shearing assays, and protein structure prediction or resolution. They led to the following results. First, each chaperone likely has a cognate substrate: CupB1 interacts with CupB2 and CupB6 with CupB4. Second, the crystal structure solved for CupB6 showed that it has a pilin and a putative adhesin domain, connected by a poly-proline linker. Third, CupB5 secretion was observed to be CupB3-independent and TpsB4-dependent. tpsB4 is encoded with its substrate tpsA4. The expression of the cupB and tpsB4/tpsA4 operons was shown to be controlled by the same regulatory pathway, Roc1, and deletion of the tpsB4 transporter gene abolished CupB5 secretion. Fourth, a structural analysis indicated that TpsB4 has two POTRA domains, and POTRA-1 interacts with the highly homologue TPS motifs of CupB5 and TpsA4. Based on these results, the thesis presents a model of CupB pilus assembly and CupB5 secretion.Open Acces

    Population Genetic Structure and Demographic History of Primula fasciculata in Southwest China

    Get PDF
    Understanding the factors that drive the genetic structure of a species and its responses to past climatic changes is an important first step in modern population management. The response to the last glacial maximum (LGM) has been well studied, however, the effect of previous glaciation periods on plant demographic history is still not well studied. Here we investigated the population structure and demographic history of Primula fasciculata that widely occurs in the Hengduan Mountains and Qinghai-Tibetan Plateau. We obtained genomic data for 234 samples of the species using restriction site-associated DNA (RAD) sequencing and combined approximate Bayesian computation (ABC) and species distribution modeling (SDM) to evaluate the effects of multiple glaciation periods by testing several population divergence models and demographic scenarios. The analyses of population structure showed that P. fasciculata displays a striking population structure with six groups that could be identified genetically. Our ABC modeling suggested that the current groups diverged from ancestral populations located in the eastern Hengduan Mountains after the largest glaciation occurred in the region (~ 0.8-0.5 million years ago), which is consistent with the result of SDMs. Each current group has survived in different glacial refugia during the LGM and experienced expansions and/or bottlenecks since their divergence during or across the following Quaternary glacial cycles. Our study demonstrates the usefulness of population genomics for evaluating the effects of past climatic changes in alpine plant species with shallow population structure

    gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution

    Get PDF
    One of the important questions in biological evolution is to know if certain changes along protein coding genes have contributed to the adaptation of species. This problem is known to be biologically complex and computationally very expensive. It, therefore, requires efficient Grid or cluster solutions to overcome the computational challenge. We have developed a Grid-enabled tool (gcodeml) that relies on the PAML (codeml) package to help analyse large phylogenetic datasets on both Grids and computational clusters. Although we report on results for gcodeml, our approach is applicable and customisable to related problems in biology or other scientific domains.Comment: 10 pages, 4 figures. To appear in the HealthGrid 2012 con

    Bayesian Estimation of Speciation and Extinction from Incomplete Fossil Occurrence Data

    Get PDF
    The temporal dynamics of species diversity are shaped by variations in the rates of speciation and extinction, and there is a long history of inferring these rates using first and last appearances of taxa in the fossil record. Understanding diversity dynamics critically depends on unbiased estimates of the unobserved times of speciation and extinction for all lineages, but the inference of these parameters is challenging due to the complex nature of the available data. Here, we present a new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. We implement a Bayesian algorithm to assess the presence of rate shifts by exploring alternative diversification models. Tests on a range of simulated data sets reveal the accuracy and robustness of our approach against violations of the underlying assumptions and various degrees of data incompleteness. Finally, we demonstrate the application of our method with the diversification of the mammal family Rhinocerotidae and reveal a complex history of repeated and independent temporal shifts of both speciation and extinction rates, leading to the expansion and subsequent decline of the group. The estimated parameters of the birth-death process implemented here are directly comparable with those obtained from dated molecular phylogenies. Thus, our model represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.[BDMCMC; biodiversity trends; Birth-death process; incomplete fossil sampling; macroevolution; species rise and fall.

    Phylogenomics of Gesneriaceae using targeted capture of nuclear genes.

    Get PDF
    Gesneriaceae (ca. 3400 species) is a pantropical plant family with a wide range of growth form and floral morphology that are associated with repeated adaptations to different environments and pollinators. Although Gesneriaceae systematics has been largely improved by the use of Sanger sequencing data, our understanding of the evolutionary history of the group is still far from complete due to the limited number of informative characters provided by this type of data. To overcome this limitation, we developed here a Gesneriaceae-specific gene capture kit targeting 830 single-copy loci (776,754 bp in total), including 279 genes from the Universal Angiosperms-353 kit. With an average of 557,600 reads and 87.8% gene recovery, our target capture was successful across the family Gesneriaceae and also in other families of Lamiales. From our bait set, we selected the most informative 418 loci to resolve phylogenetic relationships across the entire Gesneriaceae family using maximum likelihood and coalescent-based methods. Upon testing the phylogenetic performance of our baits on 78 taxa representing 20 out of 24 subtribes within the family, we showed that our data provided high support for the phylogenetic relationships among the major lineages, and were able to provide high resolution within more recent radiations. Overall, the molecular resources we developed here open new perspectives for the study of Gesneriaceae phylogeny at different taxonomical levels and the identification of the factors underlying the diversification of this plant group

    Optimization strategies for fast detection of positive selection on phylogenetic trees

    Get PDF
    Motivation: The detection of positive selection is widely used to study gene and genome evolution, but its application remains limited by the high computational cost of existing implementations. We present a series of computational optimizations for more efficient estimation of the likelihood function on large-scale phylogenetic problems. We illustrate our approach using the branch-site model of codon evolution. Results: We introduce novel optimization techniques that substantially outperform both CodeML from the PAML package and our previously optimized sequential version SlimCodeML. These techniques can also be applied to other likelihood-based phylogeny software. Our implementation scales well for large numbers of codons and/or species. It can therefore analyse substantially larger datasets than CodeML. We evaluated FastCodeML on different platforms and measured average sequential speedups of FastCodeML (single-threaded) versus CodeML of up to 5.8, average speedups of FastCodeML (multi-threaded) versus CodeML on a single node (shared memory) of up to 36.9 for 12 CPU cores, and average speedups of the distributed FastCodeML versus CodeML of up to 170.9 on eight nodes (96 CPU cores in total). Availability and implementation: ftp://ftp.vital-it.ch/tools/FastCodeML/. Contact: [email protected] or [email protected]
    corecore